Data Source:

  1. Spotify Daily Top Tracks
    (Spotify API)

Here’s a preview of the Spotify Daily Top Tracks playlist, which is a daily update of the most played tracks right now in the USA.


Questions/Hypotheses:

    1. Does time period change the popularity (proportion of streams) of pop, rap, hip hop, r&b, and rock genres in the Spotify US daily charts?
      1. Whether or not it is a weekday or weekend
      2. Whether or not it is a holiday
      3. Season (summer, winter, spring, fall)
    2. Did the popularity of happy songs (mean valence) in the top 200 Spotify US daily streams change during Covid?
      1. Before March 13, 2020 (exclusive)
      2. After March 13, 2020 (inclusive)
    3. What parameters are the most important in predicting the popularity on Spotify in the US?
      1. Popularity is calculated by summing 201 - Rank (i.e. rank 1 has score 200, rank 200 has score 1) over all the days they are on the Spotify Top 200 streams

Methodology:

Question 1
(Time period and proportion of genres played)
  • Use Spotify API to pull 2017-2021 US daily top track data
  • Join Spotify daily tracks data with Artist, Genre, and Tracks tables
  • Use two-sided two-sample large sample Z-test to answer subquestions a and b
    • Perform the above test for each of the following genres: pop, rap, hip hop, r&b, and rock
  • Use ANOVA to answer subquestion c to test the difference in proportion of a genre depending on the seasons
    • Perform the above test for each of the following genres: pop, rap, hip hop, r&b, and rock
Question 2
(happiness of songs during Covid)
  • Same dataset used in Question 2
  • Use two-sided two-sample large sample Z-test to answer question
Question 3
(parameters for popularity)
  • Popularity is calculated by summing \(201\) - Rank (i.e. rank 1 has score 200, rank 200 has score 1) over all the days they are on the Spotify Top 200 streams
  • Use Linear Regression with all the features as a baseline model
    • Possible features: acousticness, danceability, duration, energy, instrumentalness, key, liveness, loudness, mode, speechiness, tempo, explicit, time_signature, valence, genre
    • Predicted value: calculated popularity
  • Remove features one by one to improve the \(R^2\) score
  • Consider adding interaction effects and check the performance of the linear regression model. However, adding interaction effect will complicate the interpretability of the model